Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition

نویسندگان

چکیده

The performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, gets degraded with environmental noises, channel variation, physical behavioral changes in speaker. types Speaker related feature play crucial role improving systems. Gammatone Frequency Cepstral Coefficient (GFCC) features has been widely used to develop robust conventional machine learning, it achieved better compared Mel (MFCC) noisy condition. Recently, deep learning models showed learning. Most previous learning-based Spectrogram similar inputs rather than a handcrafted like MFCC GFCC features. high ratio mismatch utterances. Similar Spectrogram, Cochleogram another important for models. Like features, represents utterances Equal Rectangular Band (ERB) scale which none studies have conducted analysis robustness recognition. In addition, only limited speech-based condition using this study, model at Signal Noise Ratio (SNR) level from −5 dB 20 dB. Experiments are VoxCeleb1 added dataset by basic 2DCNN, ResNet-50, VGG-16, ECAPA-TDNN TitaNet Models architectures. identification verification both evaluated. results show that

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robustness of phase based features for speaker recognition

This paper demonstrates the robustness of group-delay based features for speech processing. An analysis of group delay functions is presented which show that these features retain formant structure even in noise. Furthermore, a speaker verification task performed on the NIST 2003 database show lesser error rates, when compared with the traditional MFCC features. We also mention about using feat...

متن کامل

Noise robust feature for automatic speech recognition based on mel-spectrogram gradient histogram

This paper proposes an alternative scheme for extracting speech features in an automatic speech recognition (ASR) system. If an ASR system is trained using a clean speech source, a noisy environment may cause a mismatch between the features from the recognition data and those from the training data. This mismatch deteriorates the recognition accuracy. Thus, unlike in existing speech features, a...

متن کامل

Learning Binaural Spectrogram Features for Azimuthal Speaker Localization

Spatial localization of speech and other natural sounds with rich spectro-temporal structure is a computationally challenging task. It requires extraction of features which are informative about speaker’s position and yet invariant to sound level and spectral modulation present in the signal. This paper demonstrates that this can be achieved with Independent Component Analysis (ICA) applied to ...

متن کامل

Learning binaural spectrogram features for azimuthal speaker localization

متن کامل

Modulation spectrogram features for improved speaker diarization

We propose the use of modulation spectrogram features in speaker diarization. These features carry longer term characteristics of the acoustic signals than the widely used MFCCs, thus providing potential improvement by using both features in combination. Using the state-of-the-art ICSI speaker diarization system, an improvement of 20.77% relative DER is obtained on the NIST Rich Transcription 2...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2022

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app13010569